Identifying Effective Translations for Cross-lingual Arabic-to-English User-generated Speech Search
نویسندگان
چکیده
Cross Language Information Retrieval (CLIR) systems are a valuable tool to enable speakers of one language to search for content of interest expressed in a different language. A group for whom this is of particular interest is bilingual Arabic speakers who wish to search for English language content using information needs expressed in Arabic queries. A key challenge in CLIR is crossing the language barrier between the query and the documents. The most common approach to bridging this gap is automated query translation, which can be unreliable for vague or short queries. In this work, we examine the potential for improving CLIR effectiveness by predicting the translation effectiveness using Query Performance Prediction (QPP) techniques. We propose a novel QPP method to estimate the quality of translation for an Arabic-Engish Cross-lingual User-generated Speech Search (CLUGS) task. We present an empirical evaluation that demonstrates the quality of our method on alternative translation outputs extracted from an Arabic-to-English Machine Translation system developed for this task. Finally, we show how this framework can be integrated in CLUGS to find relevant translations for improved retrieval performance.
منابع مشابه
LiveTrans-Cross-Language Web Search through Live Mining of Query Translations
Enabling users to find effective translations automatically for query terms not included in dictionary is one of the major goals of a practical cross-language Web search service. This paper presents a cross-language Web search system called LiveTrans, which is an experimental metasearch engine that provides English-Chinese cross-lingual retrieval of both Web pages and images. The system has bee...
متن کاملCross-lingual sentence extraction for information distillation
Information distillation aims to analyze and interpret large volumes of speech and text archives in multiple languages and produce structured information of interest to the user. In this work, we investigate cross-lingual information distillation, where nonEnglish (source language) documents are searched for user queries that are in English (target language). We propose to perform distillation ...
متن کاملA Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation
One important class of online videos is that of news broadcasts. Most news organisations provide near-immediate access to topical news broadcasts over the Internet, through RSS streams or podcasts. Until lately, technology has not made it possible for a user to automatically go to the smaller parts, within a longer broadcast, that might interest them. Recent advances in both speech recognition ...
متن کاملA Comparative Analysis of Collocation in Arabic-English Translations of the Glorious Quran
The Qur’an is the only holy book of Muslims all around the world. Each person with any religion and language is interested in comprehending and accepting the rules and regulations of their own belief. Translation of the Qur’an is only an attempt to present its meaning. One of the most challenges in translation of the Qur’an is collocation. A collocation is a sequence of words or terms that co-o...
متن کاملIdentifying Agreement/Disagreement in Conversational Speech: A Cross-Lingual Study
This paper presents models for detecting agreement/disagreement between speakers in English and Arabic broadcast conversation shows. We explore a variety of features, including lexical, structural, durational, and prosodic features. We experiment with these features using Conditional Random Fields models and conduct systematic investigations on efficacy of various feature groups across language...
متن کامل